This paper addresses the problem of single-target tracker performanceevaluation. We consider the performance measures, the dataset and theevaluation system to be the most important components of tracker evaluation andpropose requirements for each of them. The requirements are the basis of a newevaluation methodology that aims at a simple and easily interpretable trackercomparison. The ranking-based methodology addresses tracker equivalence interms of statistical significance and practical differences. A fully-annotateddataset with per-frame annotations with several visual attributes isintroduced. The diversity of its visual properties is maximized in a novel wayby clustering a large number of videos according to their visual attributes.This makes it the most sophistically constructed and annotated dataset to date.A multi-platform evaluation system allowing easy integration of third-partytrackers is presented as well. The proposed evaluation methodology was testedon the VOT2014 challenge on the new dataset and 38 trackers, making it thelargest benchmark to date. Most of the tested trackers are indeedstate-of-the-art since they outperform the standard baselines, resulting in ahighly-challenging benchmark. An exhaustive analysis of the dataset from theperspective of tracking difficulty is carried out. To facilitate trackercomparison a new performance visualization technique is proposed.
展开▼